|
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan.〔Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.〕 C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. It became quite popular after ranking #1 in the ''Top 10 Algorithms in Data Mining'' pre-eminent paper published by Springer LNCS in 2008.〔(Umd.edu - Top 10 Algorithms in Data Mining )〕 ==Algorithm== C4.5 builds decision trees from a set of training data in the same way as ID3, using the concept of information entropy. The training data is a set of already classified samples. Each sample consists of a p-dimensional vector , where the represent attribute values or features of the sample, as well as the class in which falls. At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. The splitting criterion is the normalized information gain (difference in entropy). The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurs on the smaller sublists. This algorithm has a few base cases. *All the samples in the list belong to the same class. When this happens, it simply creates a leaf node for the decision tree saying to choose that class. *None of the features provide any information gain. In this case, C4.5 creates a decision node higher up the tree using the expected value of the class. *Instance of previously-unseen class encountered. Again, C4.5 creates a decision node higher up the tree using the expected value. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「C4.5 algorithm」の詳細全文を読む スポンサード リンク
|